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Abstract 

Recent experiments show that transcription factors (TFs) indeed use the facilitated diffusion 
mechanism to locate their target sequences on DNA in living bacteria cells; TFs alternate between 
sliding motion along DNA and relocation events through the cytoplasm. From simulations and 
theoretical analysis we study the TF-sliding motion for a large section of the DNA-sequence of 
a common E. coli strain, based on the two-state TF-model with a fast-sliding search state and a 
recognition state enabling target detection. For the probability to detect the target before dis¬ 
sociating from DNA the TF-search times self-consistently depend heavily on whether or not an 
auxiliary operator (an accessible sequence similar to the main operator) is present in the genome 
section. Importantly, within our model the extent to which the interconversion rates between 
search and recognition states depend on the underlying nucleotide sequence is varied. A mod¬ 
erate dependence maximises the capability to distinguish between the main operator and similar 
sequences. Moreover, these auxiliary operators serve as starting points for DNA looping with the 
main operator, yielding a spectrum of target detection times spanning several orders of magnitude. 
Auxiliary operators are shown to act as funnels facilitating target detection by TFs. 
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Introduction 


Ever since the publication of the Luria-Delbriick model on bacterial resistance due to 


pre-existing mutants |l| computational approaches to the dynamics of biological cells have 
contributed significantly to the advance of quantitative intracellular and cell population 
dynamics. Apart from the Luria-Delbruck model and its modifications j^, the facilitated 
diffusion model has become a key to the understanding of genetic regulation in prokary¬ 
otes. Following the observation of Riggs and co-workers 3| that in vitro lac repressors—one 
specihc regulatory DNA binding protein commonly called transcription factors (TFs)—find 
their specific target sequence (operator) on E. coli DNA at a surprisingly high rate, scientists 
have examined the prOTerties of the search of TFs for their target sequence. Early studies 
of Richter and Eigen were extended in the seminal work by Berg, Winter and von Hip- 


pel j^. Their facilitated diffusion model explained the high association rates of TFs as a 
result of repeated rounds of diffusion in the bulk solution and intermittent sliding along the 

1 novel single molecule 
13| and in living cells 


DNA. Interest in this model rekindled a decade ago j6l-lll|. along wit 
experiments conhrming the facilitated diffusion model in vitro 
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16|. 


Recent refinements of the facilitated diffusion model address molecular crowding effects 
both in the cytoplasm—reducing the TF-diffusivity—and along the DNA, where other (non- 
specifically) bound proteins impede the sliding motion of the TFs 17H2l|. To account for 


the speed stability paradox (22| TFs are believed to switch between the search state, in 
which the TF shuttles quickly along the DNA but is insensitive to the target, and the 
ow-diffusivity recognition state, in which the particle is able to detect its target sequence 
The active role of spatial DNA conformations was unveiled both experimentally 


23 


and theoretically 29l-l33|. Finally, the fact that genes, that interact via local TFs, are 
statistically proximate along the prokaryotic genome (colocalisation) was argued to be due to 


the increased interaction rates (rapid search hypothesis) 34 


36|. In line with the increasing 


knowledge of the microscopic details of gene regulation many computational studies appeared 
that go beyond the typical idealisations H HQ- 

Motivated by recent experiments showing that on encounter the target operator is not 
detected with certainty by a TF sliding along the DNA 1^, we here combine theoretical and 
simulations analyses to quantify the sliding motion of a TF along the real nucleotide sequence 
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of a common E. coli strain in the presence of crowding proteins on the DNA. We establish 
a model including search and recognition states of the TF in combination with the barrier 
discrimination model 


10 


24| with a position weight matrix (PWM) based binding energy 


39l |. We also include looping effects—as often studied in thermodynamic models 
e present model: the TF, for instance, the lac repressor dimer, can simultaneously 
bind to two operators, mimicking the intersegmental transfer mechanism BSQ. 


Blockers and movers, and the role of auxiliary operators 


We describe the sliding motion of a TF for its target operator along DNA, on which 
Abiock other proteins are bound, so-called blockers or roadblocks |l^. We focus on immobf 


blockers, keeping in mind that mobile blockers may add another layer of complexity 




The A^biock non-overlapping blockers are positioned randomly and partition the DNA into 
Abiock + 1 intervals. We assume that the TF cannot by-pass the blockers, see Fig. [H Where 
the DNA is not occupied by a blocker, the TF can bind to the DNA in two orientations. In 
the case of palindromic sequences the binding energies in both orientations are equal (see 
also the score values in Methods). 

We first focus on the processes in the target region carrying possible binding positions 
between the two nearest roadblocks to the left and to the right of the main operator 01. 
Such roadblocks could be proteins like H-NS or HU We only consider conhgurations in 
which the main operator is accessible. From both simulations and an approximate analytical 
approach we determine the probability pt that the TF detects the target in the correct 
orientation before dissociation. The TF starts from a random position in this target region. 


Simulation scheme 

We focus on base pairs 359,990 to 370,010 of E. coli strain K-12 MG1655 from eco- 
cyc.org 


43l |. comprising the genes lacA, lacY, and lacZ as well as the three operators 01, 


02, and 03, to which the lac repressor (Lad) can bind The sequence length is 10,021 
base pairs (bps). Since the binding motif of Lad covers w = 21 bps we obtain 10,001 possi¬ 
ble binding positions in two orientations. We choose Abiock = 71 blockers of size w to match 


the occupation fraction of Tabaka et ah 


2l| 
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FIG. 1: Scheme of TF search process along DNA (black line), which is partitioned by non- 
specifically bound roadblocks (red symbols). When TF (green symbol) is bound to DNA in the 
search mode, it can slide to a neighbouring position (orange arrows to the left and right) or intercon¬ 
version between search and recognition state occurs (grey arrows below TF). Finally, dissociation 
(pink arrow) may lead to re-association nearby (dash-dotted line) or onto another segment (dashed 
line). The main and auxiliary operators (targets for TF binding) are shown as blue rectangles. 


The general simulation scheme is depicted in Fig. [T] At each position the TF can be either 
in the loosely bound search state or in the tightly bound recognition state. In the search 
state the TF has four possible actions: the particle can move to the left or to the right, it 
can dissociate, or it can change to the recognition mode at its position. If the latter occurs 
at the position of the main operator 01, the corresponding time is saved as a first target 
detection. We later deal with dissociation from the DNA. Once in the recognition mode, 
we assume that the binding is so tight that the TF cannot move to neighbouring positions. 
As looping is neglected in this first, linear version of the model, its only option is to return 
to the search state at this position. The rates at which these transitions occur depend on 
the energetic barriers that need to be crossed during the internal protein dynamics. These 
are determined by the standard Gillespie algorithm g, [4^. Methods contains a detailed 


description of the simulations. Times are measured in units of the inverse attempt rate Aq 
from Eq. (|6]) in Methods. 


The energy Eg in the TF search state and the barrier Ebs for she 


base pair are assumed to be independent of the DNA sequence [lO 


ing to a neighbouring 


46| . The barrier Ebc,i 


to switch to the recognition state and the associated TF energy Er,i depend on the binding 
score (Methods) of the underlying sequence at the TF position i. We express E^^i and Ebc,i 
with respect to the reference scores E^. and Ebc, and we assume a linear relationship with 
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the score at the specihc position i, 


Er^i and E^Q^i Ej^q cxy^Si. ( 1 ) 

Here ASi = Si — {S) is the difference between the score at position i and the average score in 
the data set. 7 = —1.3378 is a proportionality factor (Methods). The volatility parameter 
a tunes the sensitivity of E^^^i to the DNA sequence. If a = 0 the barrier height does not 
change with the sequence and therefore this corresponds to blind testing of the sequence. 
If a = 1, an induced £t mechanism is at work. The closer the probed sequence is to that 
of the target, the faster the TF switches to the target-sensitive recognition mode since the 
barrier height changes exactly as much as the energy in the recognition mode. To obtain the 
target detection probability before dissociation shown in Fig. |2] (see Results), iVrun = 5 x 10"^ 
independent simulations starting from random positions in the target region were performed 
and it was counted in how many cases the target was reached. As we show here our model 
(IT]) for the energy score relation together with the additional element of the volatility a 
elucidate the role of the sequence sensitivity in the speed stability tradeoff of TF search 
processes. 

Theoretical approach 

We compare the simulations results of Fig. [2] to a theoretical model based on a target 
region with N possible binding positions. For mathematical details see Methods. 

The fundamental parameters are the sliding rate F to neighbouring positions, the rate 
kst of a conformational switch to the recognition mode at the target site resulting in direct 
target detection, and the dissociation rate kos from any site. At all non-target positions we 
assume constant rates for the changes between recognition and search modes, denoted by fcgr 
and kj-s- We place the target at bp m and the TF starts at a random position. As detailed 
in Methods these quantities determine the mean target detection time r 7 v,m (see below) and 
the probability to reach the target before dissociation pt{N,m), written as 

|JVp,(JV.m)]-‘ = + [1 + G(f)]-‘. (2) 

where F = T/k^s and kgt = Kt/koE- The function G is defined via a series expansion in 
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Q. 



N 


FIG. 2: Probability pt to detect the target before dissociation as function of the target region 
length N. Symbols: simulations using 500 different configurations with 50,000 runs for each. 
Lines: simplified theoretical model with a centred target (full lines) and a target at the boundary 
(dashed lines). Parameters (in units of ksT): = 0, E},c = A, Eg = —7, Fibs = —6. Colours: cyan 

(a = 0.1), green {a = 0.2), blue (a = 0.3), black (a = 0.4) and red {a = 0.5). 


Eq. ffTT)) (Methods). For y = 1 + e — ■\/e{2 + e) with e = l/(2r), we find that 

cosh([iV - (m - |)] ln( 2 /)) cosh([m - i]ln( 2 /)) 


(1 + G(r)) = tanh 


V 2 / 


cosh(Af ln(|/)/2) sinh(iVln(|/)/2) 


( 3 ) 


obtained by Kolonieisky et al 


extends the result of Refs 


• E?, 


47 


48| and studied experimentally in Ref. Thus Eq. 


48| to the more genera 


case when the target is not detected 


with 100% efficiency, as revealed in recent experiments 
the mean search time TN,m is (see Eqs. fl^- ffT7l) in Methods) 


15|. Introducing the ratio q = kgr/Ks, 




l + (l + g) 


G + IfJ X dG/dT 


kst E 1 + G 


( 4 ) 
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Results I 


Target detection probability 

Simulations results for the target detection probability pt are shown in Fig. [2] for hve a 
values between 0.1 and 0.5. We do not consider larger a values since already for a = 0.5 
there is no longer an energy barrier to be crossed at the target site and thus no more changes 
are observed. Lines of matching colour in Fig. [2] are results of the analytical model, Eq. ([2]). 
The target is either centred (full lines) or located at the boundary of the target region 
(dashed). 

The simulated data scatter nicely between the two limiting theoretical lines for centred 
and boundary target positions over three orders of magnitude in the size N of the target 
region, pt decreases monotonically with N, as large target regions on average imply longer 
paths which have to be traversed en route to the target, implying a higher risk to dissociate. 
Larger a values, corresponding to a searcher which checks more often for the target, lead to a 
higher detection probability. Another effect of a concerns the influence of the target position. 
For small values of a the corresponding curves nearly coincide, i.e., there is no signihcant 
target position dependence. For higher a values, centred targets effect a substantially higher 
detection probability as the full lines lie above the corresponding dashed ones. Thus, only 
when the target detection probability on an individual encounter reaches substantial values, 
a suitable position of the target pays off. 

We see that for the target detection probability the theoretical model, in which all energies 
on non-target sites are replaced by average values, nicely reproduces the results of the 
simulations based on sequence specihc binding energy values. 

Target detection time 

In Fig. [3] the mean detection times to the target are shown for the same a values 
used in Fig. |2l Since the particles can dissociate, TN,m is a conditional time: given that the 
particle detects the target with the probability shown in Fig. [2l at what time will this occur 
on average. The symbols in Fig. |3]show the simulations results, the lines correspond to the 
theoretical model with a centred target (full lines) and a target at the boundary (dashed). 

The features of Fig. [3]fall into two cases. For N < 100, as with the detection probabilities 
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FIG. 3: Mean first target detection time at 01 as function of the target region size N. Symbols: 
simulations. Lines: theoretical model with a centred target (full lines) and with a target at the 
boundary (dashed lines). For small a values dashed and full lines nearly coincide. Colours as in 
Fig. [21 Due to the presence of the auxiliary operator 03 for N > 100 a second branch of results 
emerges. Parameter values (in ksT): = 0, E^c = 4, = —7, Fibs = —6. 

above the simulations agree well with the theoretical model for all a values. Again, a clear 
ordering with a occurs: volatile TFs (large a) find the target quicker than nearly blind TFs 
with a = 0.1 (cyan). Moreover, only in the case of large a, when individual encounters with 
the target have a substantial probability for target detection, the target position comes into 
play (e.g., for the red lines). This is one of our central results. 

For N > 100, apart from simulations data consistent with the theoretical lines a second 
branch of results appears with target detection times nearly two orders of magnitude longer 
than expected. This effect can be rationalised by the presence of the auxiliary operator 03 
in the target region. It resides 92 nucleotides away from the main operator 01 such that only 
target regions with a size larger than that can contain both operators If both operators 
are in the target region, the TF can change to the recognition mode at the auxiliary operator 

^ In this simplified model focusing on the target region, the stronger auxiliary operator 02 does not play a 
role, since it has the inverse orientation of 01 and 03 and we do not allow for orientation changes while 
the TF is bound to DNA. 
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FIG. 4: Logarithmic histogram of energy values in the recognition mode at all 10,001 positions in 
both orientations (blue and red) for Er = 0, 'j = —1.3378. 

and thus become trapped away from the main operator. Such time consuming checks for the 
target may occur at any non-target position. However, at 03 this is particularly severe since 
it has a rather strong binding energy (see Fig. 0]). The gapped energy spectrum yields search 
times which are way above the values of the theoretical model, since the latter assumes all 
non-main target sites to be energetically equivalent. 

Inspection of the upper branch of the results in Fig. |3] indicates that it barely contains 
data obtained with small a values (cyan and green). This can be explained by comparison 
with Fig. |2) in these cases even the probability to detect 01 is rather small. This effect is 
even more pronounced for the considerably weaker 03. However, when such TFs change 
to the recognition state at the auxiliary operator, they will spend more time there than 
particles with a larger a, since these face a larger barrier to be crossed (Eq. ([T])). As not 
all target regions of size N > 100 contain the auxiliary operator, the lower branch of results 
still coexists. Here the conditional target detection time increases with N but levels off to 
a plateau. 

Conversely, for rather volatile searchers (red data points) in regions comprising both 
operators, for N G [100,150] there is a slight tendency that the mean search time decreases 
with N. This results from the fact that these regions, which are only marginally longer 
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than the distance between the two operators, by definition have both operators near the 
boundaries. This yields longer search times, similar to the case of shorter target regions, for 
which the dashed lines are always above the corresponding full lines in Eq. [31 We consider 
the influence of the location of the operators with respect to the non-specific blockers in 
more detail in the following paragraph. 

Preference of 01 over 03 

In the hypothetical situation of two equally strong operators in the target region, only 
their relative position in the target region would influence which one of them is more likely 
to be detected first. The biologically relevant situation considered here with two different 
operators is more subtle. When both 01 and 03 are in the target region we registered which 
one was detected first. The preference for 01 shown in Fig. [3] is given by the probability 
that 01 is detected first. The shift by 1/2 leads to positive values when the probability is 
larger to detect 01 first. 

To single out geometrical effects, the axis x quantifies which of the two operators is more 
central in the target region and thus has—from a geometric point of view—higher chances 
to be hit first. We define x = |x 3 — 1/2| — |xi — 1/2|, where Xi denotes the relative position 
of operator Oi, {i = 1, 3) in the target region. The x values range between —0.5 and -1-0.5, 
positive values corresponding to a favourable position of 01. 

As expected, since 01 is the stronger operator, most of the data points are positive. For 
small a values (cyan and green in Fig. [5]) it is more probable to detect 01 first, but the 
relative positions of the two operators are not significant. Increasing the volatility from 
a = 0.1 to 0.3 leads to a monotonic increase in the accuracy of discrimination between the 
two operators. For even larger values of a this accuracy decreases, since now the particle 
checks for the target often enough to detect the auxiliary operator with sufficient probability. 
Then, geometric effects become more important, as seen from the increasing slope of the 
black dotted line for a = 0.4 and the red dot-dashed line for a = 0.5. In the latter case, 
some negative values of the preference are observed, indicating that a volatile TF is more 
likely to detect the auxiliary operator first, if its position is much closer to the centre of the 
target region. 

Intermediate a values enable the TF to detect the main operator first, without losing 
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FIG. 5: Preference of first detecting 01 and not 03 as function of the centrality x of the position 
of the two operators. Data constitute a moving average over each neighbouring 21 data points. 
Colours as in Figs. [2]and[3l a = 0.1 (full cyan line), a = 0.2 (green, long dashes), a = 0.3 (blue, 
short dashes), a = 0.4 (dotted, black) and a = 0.5 (red, dot-dashed). 

time from binding to the auxiliary operator. In terms of the search model presented so far, 
the occurrence of 03 appears like a design bug instead of a useful feature, since it delays the 
detection of the main operator. We now show that auxiliary operators in a more realistic 
scenario indeed act as funnels for TFs towards the main binding site. 


Auxiliary operators make sense in presence of looping 

As evident from Figs. [3] and [5] the presence of the auxiliary operator 03 in the target 
region signihcantly influences the rate of target detection. In an extension of our model 
several conhgurations can be distinguished depending on whether or not the two auxiliary 
operators are accessible. In a living cell the occupation with non-specihc binders and thus 
the probability for a particular blocker conformation change in time. 

To model the complete search process of a TF with two binding motifs such as Lad, 
we consider what happens after a dissociation from DNA. After dissociation the time spent 
in 3D is assumed to be exponentially distributed with mean time Tb- For the jump length 
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Xjump—like all the following lengths measured in bps—on the DNA effected by 3D excursions 
we assume the cumulative distribution 


C'( 


X 


jump J 


( ^jump 
^min 


1 -/ 3 - 



(5) 


characterised by the minimal jump length Xmin = 0.01, the maximal jump length Xmax = 
2.3 X 10®—corresponding to half the E. coli genome size, and the scaling exponent (3 = 1.2 
characterising the looping properties of the repressor. Scaling laws of the form f{x) ~ 
for the length x stored in a random loop formed by a polymer chain occur due to the 


while in the presence of excluded volume interactions the exponent increases to rj 


52| . Here we chose the lower exponent rj = 1.2 following the data by Priest et ah 53|. To 


2.2 

50- 

531. 

To 


obtain the cumulative distribution ([S]), we integrate the power law f{x) in between the lower 
and upper cutoffs Xmin and Xmax, and normalise this expression. Note that our results are 
not overly sensitive to the exact value of the exponent /3, as in the free energy it corresponds 
to a logarithmic dependence on x. 

Here we assume that a power law similar to Eq. ([5]) also applies to the jump statistics. 
Whenever the particle jumps out of the 10 kbps range that we study, we place it at a random 
position in our system, mimicking the complete loss of correlation with the dissociation 
position for long jumps. Unlike during sliding motion, it can change the orientation during 
a 3D relocation. To simplify matters we coarse-grain events outside the target region, since 
we are not interested in the sliding motion far away from the target. We then hrst simulate 
the mean dissociation times from all Wiock regions that do not contain the target. To this 
end, simulations are performed as outlined in the paragraph Simulation scheme, where the 
code is run /avg = 20 times multiplied by the length of the corresponding interval measured 
in bps to guarantee reasonable statistics. 

Whenever the TF detects and binds to one of the auxiliary operators, apart from returning 
to the search state at this position there is the possibility to form a DNA loop with 01. 
For this event to occur, an initiation time is drawn from an exponential distribution with 
mean rinit, which is assumed to be the time needed to form a non-specihc complex with 
the target region. To keep the number of parameters as low as possible we assume these 
initiation times to be equal for both auxiliary operators. To the loop initiation time we 
add a time lag, since after landing with its second half in the target region, the TF has 
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to actually detect the main operator. The latter is obtained from a simulation as dehned 
in Simulation scheme. The same process is possible the other way round: starting from 
binding to the main operator and, before switching to the search mode, closing a loop with 
one of the auxiliary operators. To simplify matters we do not model direct looping between 
the auxiliary operators. The times for releasing a loop are calculated similarly to the mean 
dissociation times above. We now study the full model with looping for N ^ 180. 


Results II 


Influence of the volatility 


Of particular biological interest are the time spans Tfree during which the operator is 
unoccupied, as in these intervals RNA polymerase can bind to the promoter and start 
transcription. We start with a conformation in which looping is precluded by blocking both 
auxiliary operators with non-specihc binders. In Fig. [Hlthe distribution of r^ee is shown for 
four values of the volatility parameter a. 

In all cases we obtain two distinct peaks separating a short and a long time scale. For 
increasing values of a the hrst peak, located at around 10° time units, grows relative to the 
second one, located at around 10° time units. Since the total simulation time was hxed, 
the total number of events grows as well: The peak at short times is due to events when 
a TF, after switching from the recognition to the search mode, performs just a few sliding 
steps before returning to the recognition state at the target. Conversely, the long time peak 
corresponds to events when a TF dissociates, possibly multiple times, from DNA and loses 
correlation with the unbinding position, and thus leads to long time spans, in which the 
target operator is vacant. That the hrst peak gains in importance for larger values of a is 
due to the fact that, as seen above, the individual target detection probability is higher in 
that case. 

We note that to initiate transcription, RNA polymerase must bind the promoter while 
the TF is not at the operator. If the repressor rebinds to the operator before an RNA 
polymerase manages to And the promoter, the cell does not “feel” these quick occupancy 
fluctuations and experiences only a sin gle effective binding event of the repressor, and no 
transcription takes place (compare Ref. 54l|h 
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FIG. 6: Distribution of time spans Tfree during which the operator is free of repressor in a system 
without looping. The abscissa shows the logarithms of the time spans during which the operator 
is accessible such that bins at larger Tfree values are wider. Parameters: = 50e, rinit = lOr;,. 

Here e = 2.718... is Euler’s number, further information on the used parameters is provided in 
the Methods. The total simulation time is Tmax = 3 x 10^^. We use four values a = 0.2 (green, 
dash-dotted), 0.3 (blue, short dashes), 0.4 (cyan long dashes) and 0.5 (black, full line). 

Influence of looping and the average time spent in 3D 

We now choose a configuration in which the auxiliary operator 02 is vacant and we hx a 
to a value of 0.5. The corresponding results are shown by black lines in Fig. [7l Full lines are 
for the same values of tj, and Tinit as in Fig. IQ dashed lines represent the case when both are 
ten times larger. We observe that both full lines still feature two peaks centred at ~ 1 and 
10® time units. Between these there appears a new peak at intermediate times. Given that 
the loop initiation time in this case is Tinit ~ 1-36 x 10^, these events can be self-consistently 
interpreted as return events to the target due to looping: the DNA was looped between the 
main and an auxiliary operator, dissociates from the main operator and reestablishes the 
loop. 

That the peak for fast rebinding events has a reduced size is due to the fact that our 
looping algorithm counts all fast fluctuations of the occupancy during which the loop still 
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FIG. 7: Distribution of r^ee in systems where looping is possible involving 02 (black lines) and 
both 02 and 03 (red lines). Full lines: r^, Tinit, Tmax as in Fig. [6l Dashed lines: = 5000e, 

"^init = lOrt,. In all curves: a = 0.5. 


exists as a single long-lived event. In the simulations without looping these events appeared 
explicitly (Fig. |6]). Accordingly, the remaining events in the reduced hrst peak correspond 
to target rebinding without an existing loop to an auxiliary operator. Given that fast re¬ 
binding has no biological meaning, looping introduces a new intermediate time scale, and 
typical return times to the operator are greatly reduced, resulting in improved repression. 
This agrees with the observations of Choi and co-workers according to which DNA looping 
enables the cell to regulate gene expression on many time scales via distinct forms of dis¬ 
sociation events Comparing this behaviour to the black dashed line in Fig. [7l when 


both 3D excursions and looping take around ten times longer, shows that both the looping 
peak and the rightmost peak are shifted to larger times underlining the physicality of our 
interpretation. 

If both auxiliary operators are accessible and 03 is in the target region (red lines in 
Fig. [^, the results are similar to the previous ones (Fig. [6]). Three peaks are observed, and 
increase of u and Tinit shifts the peaks—apart from the rt,-independent fast rebinding peak— 
to the right. There is one major difference between the two settings: When both auxiliary 
operators are accessible, the size of the third peak is nearly as large as the second one. Thus, 
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very long return times occur more often when both auxiliary operators are present. The 
significant changes of the target search times in the presence of the auxiliary operators are 
our other central result. 


Discussion 


One-dimensional sliding of a TF along the DNA is a vital ingredient of the facilitated 
diffusion model. Sliding is indispensable in the final step of the search for the specihc binding 
site by the TF, namely, the recognition of the binding sequence. For a real bacterial DNA 
sequence we here analyse in detail the dynamics of the TF sliding in a region around the 
main operator in the presence of roadblocks, e.g., proteins like H-NS or HU For a 

minimal set of parameters we unveil the role of the density of the roadblocks and the DNA 
sequence on the detection speed of the target sequence. Our results underline the special 
role played by auxiliary TF operators. These auxiliary operators act as a funnel for the TF 
to facilitate the target search in the nucleotide sequence. 

More specihcally, combining a simplihed theoretical model and simulations we follow a 
TF moving in a region around the main operator delimited by two non-specifically bound 
roadblocks while switching between a search mode, in which it shuttles along the DNA while 
being blind to the target, and a recognition mode, in which it cannot move along DNA but 
which is essential to detect the target. The interconversion rates depend on the underlying 
sequence. Motivated by recent experiments showing that not every target encounter leads 


to detection of the target sequence 


15| , we interpolated between the extreme cases of nearly 


blind switching between the modes and an induced £t situation, in which the energetic 
barrier to be crossed changes as much as the specific binding energy. Numerical results 
for the probability to detect the target before dissociation and for the mean detection time 
demonstrate impressive agreement with our theoretical model (see Fig. [Hand upper branch 
in Fig. |3]), as long as no further binding sites of similar strength are present. If an auxiliary 
operator is within the target region, an intermediate rate of checking for the target yielded 
the highest accuracy in discrimination between main and auxiliary operators. However, while 
auxiliary binding sites act as traps in the simplified model, in the more realistic situation 
when DNA looping is allowed, they can be seeding points for the formation of loops joining 
two operators. In the second part we therefore included looping in the simulation. For our 
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parameters, this leads to quick rebinding events to the main operator and thus increases 
significantly the local effective TF density, in accordance with classical observations. This 
approach can be easily transferred to other TFs with known binding motif. 

Given the fairly large number of the parameters involved and the complexity of the dy¬ 
namics conveyed by the broad range of apparent time scales, definite quantitative statements 
of this problem are hard to give. Furthermore, the target search here was modelled for a 
single TF, while in a living bacteria cell approximately a dozen lac repressors perform this 
task simultaneously. Additionally, other TFs could partially block the specific binding site 


of the TF under consideration, and could impec 
As recent studies showed (for instance, see Ref. 


he establishment of the lac specific loop. 
20|) the effects of additional binders are not 


always obvious and require careful analysis. Since the binding to the operator(s) is rather 
strong, it is questionable to assume that the TFs are independent and we face a multi-body 
problem. However, the concentration effects and the expression output in terms of the oc¬ 
cupancy of all three operators were successfully studied in terms of thermodynamic models 
40(] using similar language. As our simplified theoretical model for events in the target re¬ 
gion yields such a good agreement with the numerical simulations and given that more and 
more quantitative experimental results appear, it seems to be a logical extension to equip 
thermodynamic models with rates obtained from our model presented herein. Moreover, 
the accessibility of the three operators could be modulated in time to mimic the mobility of 
nonspecific binders which can block the operators. In this spirit we believe that the results 
reported herein represent an important step forward toward the quantitative understanding 
of gene regulation in living prokaryotic cells, and form the basis for future, more detailed 
models. 


Methods 

Here we describe the simulations method and the calculations for the above results. 


Numerical simulation of the simplified model 

The TF is present in either the loosely bound search state or tightly bound recognition 
state. In the search state at position i the TF can either slide to the neighbouring sites i — 1 
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or i + 1 while remaining in the search state, it can dissociate or switch to the recognition 


state at the same position (Fig. [T]). Such a switching event at the target site 
01) corresponds to detecting the target. This differs from the approach of Ref. 


the operator 


56l | , in which 


a further target detection step was used after changing to the recognition state at the target 
site. If the particle is next to a blocker and tries to move onto the excluded site, the move is 
cancelled. The standard Gillespie algorithm is used to draw the rates for the above events. 


A central role is played by the energetic barriers which need to be crossed, measurec 
of ksT with respect to the unbound state of zero reference energy (similar to Refs. 


in units 
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67!). 


In the recognition state we assume the TF to be immobile. In the first version of the 
model without looping the TF can only return to the search state. Generally when going 
from state a with energy Ea to state b with energy E/,, separated by an energetic barrier 
Eba, the rate, kab for this step is given by 


kab = Ao X exp(-/3AE), 


( 6 ) 


with Ai? = max{Eb — Ea, Eba — Ea, 0}. In absence of a barrier {Eba < Ea) and when the 
energy of the final state is smaller than that of the initial state {Eb < Ea) the reaction is 
assumed to occur with attempt rate Aq, which is the inverse of the elementary time step 
in which all times are measured. To convert our results to real times, this time step can 
be related to the known ID diffusion coefficient of a given TF. We note that our approach 


differs from the convention of Ref. 


58 


59| . in which the specific binding barrier has to be 


crossed each time the TF slides to a neighbouring position. 

We fix the energy difference between the specific binding energy at the main operator 01 
and the energy in the search state as 15.3 60|. With the choice Eg = —7 applied in the main 
text this implies Eqi = —22.3 (Fig. 0]). The proportionality factor 7 can be determined once 
all values of the score matrix are known via the above mentioned demand Eg — Er^oi = 15.3 




a. Score matrix The score matrix is obtained from standard methods and calculated 
for both orientations in which the TF can bind: the PWM score S' of a putative in the most 
general form is written as 




j=i 


1 fijj + 

p{lj) Nbg + J2b4b) 


( 7 ) 
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where w denotes the length of the binding motif, Ij the nucleotide at position j in the input 
sequence, p{b) the background frequency of base b, Nbs the number of known binding sites, 
and s{b) a pseudo-count function. 


In the following we stick to the convention used by Vilar 


62| . namely, a = e (where 


e = 2.718... is Euler’s number), p{lj) = 1/4 for all Ij (all nucleotides appear with equal 
probability) and s{b) = 1 for all b (we use the same pseudo-count function for all four types 
of nucleotides). Given that there are Nbs = 3 known operators to which the repressor binds 
(commonly denoted by 01, 02 and 03), this yields 
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S = ^lti(4(/,„, + l)/7). 

i=i 


( 8 ) 


For the three known operator sites the scores are [ 6 ^: Sqi = 13.38, S 02 = 12.17, and 
So 3 = 10.95. A histogram of the energy values in the recognition state for the 10,001 binding 
positions surrounding the 01 operator is shown in Fig. HI where Er = 0 and 7 = —1.3378 (a 
proportionality constant translating score differences into energetic differences) were chosen 
such that Er^oi = —22.3. At the lower end of the energy spectrum the three operators can be 
recognised. Note that there is an energetic gap to all other binding sites, see the discussion 
of such a gapped situation in Ref. [63 1. 


Simplified theoretical model 

The simplihed theoretical model includes N possible binding positions, N being an odd 
number. This way a central node exists, but an analogous calculation can be done for even 
N. Applying the scheme of possible reactions we have the following differential equations 
for the probability density C]\fj{t) of TFs in the search state at base pair j at time t and the 
corresponding probability density pNj{t) of TFs in the recognition state, when the TF is at 
bp m, 

= T [cArj_i -f Catj+i — (2 — — ^j,Ar)cArj] 

fcoffCVjj [^sr T bj ^sr)]CVj 

T^rs(l (^) 
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and 


dpN,j 

dt 


[^sr ^sr)]cAf,j 

fcrs(l dj^m)PN,j 


( 10 ) 


This set of equations is more conveniently treated in Laplace space with respect to time, 
where we denote the variable conjugate to t by n and the corresponding functions with a 
tilde. For convenience we omit the explicit argument u in the following. 

If the particle starts its motion in the search state, initially the probabilities p^j vanish. 
This is due to the simple proportionality between patj and 

^sr T dj YniJ^st ^sr) i N 

In particular, at the target site pN,m = kstCN,m/u, and at all other sites pNj^m = 
ksrCNj^m/{u + k^g) ■ Solviug this system of equations amounts to finding the solution of 
a tridiagonal matrix system. Of particular interest is the probability at the target site 
encoding the Laplace transform of the flux to the target. 


jN,m kgtCN,m '^^PN,m- (1^) 

In the following we introduce a temporary additional index for j, c and p denoting the 
node on which the particle starts, taken to be n. With the auxiliary function C,{u) = 
kst + ^off + u the flux to the target becomes 


jN,m,n kgtCN,m,n '^PN, 171,71 

N-1 „ 

kst(^i,N,m,n^ 


i=0 


N-1 

E 

i=0 


N 


c - 1 a 

i,N,m,m ^i,N,Tn,n' 


n'=l 


(13) 


r‘ 


where quantities with a hat are obtained by dividing the corresponding quantities without 
hat by the auxiliary function ^{u) = kos + m(1 + ksr/{u + /crs))- The parameters ai^N,m,n are 
given by 


for n <m, and 


n+min{—1,2—m} 
j=max{0,r2+2—A^} 


mm{N,i-\-m}—n 
ji=max{0,2— 72 + 1 } 


2(n - 1) - j\ f 2{N -m) - Anm - A 

Ayim T Ajj 


2{N - n) - j\ /2(m - 1) - Aij + A^ 
j /V ^nm 




(14) 


(15) 
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for m <n. Anm = n — m and similarly A^- = i — j. 

For a homogeneous initial distribution we omit the last index for the starting position of 
the TF and 

1 JL^ 

( 16 ) 


jN,m jy jN,m,r, 

n=l 


which can be Taylor expanded of up to hrst order in u yielding the probability pt{N,m) to 
reach the target before dissociation as well as the mean (conditional) target detection time 
TN,m given by Eqs. ([2]) and (jl]). The function G is dehned by the series expansion 


G(f) = 


N-l 

Y. I Y (^N,i,m,n I r* 

i=0 \ n^m 


N-l 

Y 

i=0 


(17) 




where F = F/ k^s- Note that the auxiliary function G does not depend on the target detection 
rate kst, but only on the geometry of the system via ai^N,m,n and on the hopping dynamics 
encoded in F. For a centred target, m = {N + l)/2, and in the limit kst ^ kos the target 
detection probability simplihes to 


/ / X , N tanh(^ ln(|/)) 

Pt{N,{N + l)/2)= 


N tanh(i ln(|/)) ’ 


(18) 


reminiscent of Ref. 641. 

For the conditional mean search time for a centred target in the limit of vanishing disso¬ 
ciation rate koS —t 0, we obtain G ^ N — 1 and dG/dV —)■ N{N‘^ — 1)/[12F^], such that via 
Eq. dH), 

1 r /V -I-1 1 1 

(19) 




N+1 1 

12F 


In this limit the existence of the recognition state away from the target simply slows 
down the mean target detection time via the prefactor 1 -|- g of the second term. In the 
limiting case g —)■ 0, when the recognition state is never entered unless the particle is on 
the target site, this result reduces to the classical solutions for incoherent exciton hopping, 
tv,(7v+i)/2 = N/kst + {N - 1)/[12F] 65]. 
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